Dataset statistics
| Number of variables | 9 |
|---|---|
| Number of observations | 245955 |
| Missing cells | 6304 |
| Missing cells (%) | 0.3% |
| Duplicate rows | 1 |
| Duplicate rows (%) | < 0.1% |
| Total size in memory | 16.9 MiB |
| Average record size in memory | 72.0 B |
Variable types
| Numeric | 8 |
|---|---|
| Categorical | 1 |
| Dataset has 1 (< 0.1%) duplicate rows | Duplicates |
fare is highly overall correlated with fare_lg and 2 other fields | High correlation |
fare_lg is highly overall correlated with fare and 1 other fields | High correlation |
fare_low is highly overall correlated with fare and 1 other fields | High correlation |
nsmiles is highly overall correlated with fare | High correlation |
passengers has 7439 (3.0%) zeros | Zeros |
Reproduction
| Analysis started | 2024-11-28 23:15:57.372306 |
|---|---|
| Analysis finished | 2024-11-28 23:16:07.627442 |
| Duration | 10.26 seconds |
| Software version | ydata-profiling v0.0.dev0 |
| Download configuration | config.json |
Year
Real number (ℝ)
| Distinct | 31 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2008.5241 |
| Minimum | 1993 |
|---|---|
| Maximum | 2024 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.9 MiB |
Quantile statistics
| Minimum | 1993 |
|---|---|
| 5-th percentile | 1996 |
| Q1 | 2001 |
| median | 2008 |
| Q3 | 2016 |
| 95-th percentile | 2022 |
| Maximum | 2024 |
| Range | 31 |
| Interquartile range (IQR) | 15 |
Descriptive statistics
| Standard deviation | 8.7033645 |
|---|---|
| Coefficient of variation (CV) | 0.0043332138 |
| Kurtosis | -1.1416162 |
| Mean | 2008.5241 |
| Median Absolute Deviation (MAD) | 7 |
| Skewness | -0.0091655819 |
| Sum | 4.9400655 × 108 |
| Variance | 75.748553 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=31)
| Value | Count | Frequency (%) |
| 1993 | 9739 | 4.0% |
| 1996 | 9081 | 3.7% |
| 1997 | 8949 | 3.6% |
| 1999 | 8757 | 3.6% |
| 1998 | 8708 | 3.5% |
| 2001 | 8648 | 3.5% |
| 2002 | 8589 | 3.5% |
| 2000 | 8541 | 3.5% |
| 2003 | 8488 | 3.5% |
| 2004 | 8466 | 3.4% |
| Other values (21) | 157989 |
| Value | Count | Frequency (%) |
| 1993 | 9739 | |
| 1994 | 2454 | 1.0% |
| 1996 | 9081 | |
| 1997 | 8949 | |
| 1998 | 8708 | |
| 1999 | 8757 | |
| 2000 | 8541 | |
| 2001 | 8648 | |
| 2002 | 8589 | |
| 2003 | 8488 |
| Value | Count | Frequency (%) |
| 2024 | 1905 | 0.8% |
| 2023 | 7788 | |
| 2022 | 7809 | |
| 2021 | 7758 | |
| 2020 | 7520 | |
| 2019 | 8148 | |
| 2018 | 8195 | |
| 2017 | 8232 | |
| 2016 | 8227 | |
| 2015 | 8150 |
quarter
Categorical
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 11.7 MiB |
| 1 | |
|---|---|
| 3 | |
| 2 | |
| 4 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 245955 |
|---|---|
| Distinct characters | 4 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 3 |
|---|---|
| 2nd row | 3 |
| 3rd row | 3 |
| 4th row | 3 |
| 5th row | 3 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 63894 | |
| 3 | 61204 | |
| 2 | 60587 | |
| 4 | 60270 |
Length
Histogram of lengths of the category
Common Values (Plot)
| Value | Count | Frequency (%) |
| 1 | 63894 | |
| 3 | 61204 | |
| 2 | 60587 | |
| 4 | 60270 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 63894 | |
| 3 | 61204 | |
| 2 | 60587 | |
| 4 | 60270 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 245955 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 1 | 63894 | |
| 3 | 61204 | |
| 2 | 60587 | |
| 4 | 60270 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 245955 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 1 | 63894 | |
| 3 | 61204 | |
| 2 | 60587 | |
| 4 | 60270 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 245955 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 1 | 63894 | |
| 3 | 61204 | |
| 2 | 60587 | |
| 4 | 60270 |
nsmiles
Real number (ℝ)
HIGH CORRELATION 
| Distinct | 1155 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1189.8123 |
| Minimum | 109 |
|---|---|
| Maximum | 2724 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.9 MiB |
Quantile statistics
| Minimum | 109 |
|---|---|
| 5-th percentile | 285 |
| Q1 | 626 |
| median | 1023 |
| Q3 | 1736 |
| 95-th percentile | 2510 |
| Maximum | 2724 |
| Range | 2615 |
| Interquartile range (IQR) | 1110 |
Descriptive statistics
| Standard deviation | 703.14347 |
|---|---|
| Coefficient of variation (CV) | 0.59097007 |
| Kurtosis | -0.840847 |
| Mean | 1189.8123 |
| Median Absolute Deviation (MAD) | 481 |
| Skewness | 0.56263484 |
| Sum | 2.9264029 × 108 |
| Variance | 494410.74 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 2510 | 3221 | 1.3% |
| 2619 | 2038 | 0.8% |
| 1246 | 1902 | 0.8% |
| 2329 | 1818 | 0.7% |
| 773 | 1719 | 0.7% |
| 2611 | 1705 | 0.7% |
| 372 | 1629 | 0.7% |
| 1139 | 1621 | 0.7% |
| 1465 | 1597 | 0.6% |
| 209 | 1465 | 0.6% |
| Other values (1145) | 227240 |
| Value | Count | Frequency (%) |
| 109 | 15 | < 0.1% |
| 115 | 8 | < 0.1% |
| 122 | 82 | |
| 129 | 2 | < 0.1% |
| 130 | 67 | |
| 133 | 40 | < 0.1% |
| 137 | 29 | < 0.1% |
| 145 | 9 | < 0.1% |
| 148 | 128 | |
| 155 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 2724 | 236 | 0.1% |
| 2704 | 1062 | |
| 2700 | 230 | 0.1% |
| 2636 | 323 | 0.1% |
| 2629 | 4 | < 0.1% |
| 2625 | 336 | 0.1% |
| 2619 | 2038 | |
| 2611 | 1705 | |
| 2608 | 64 | < 0.1% |
| 2588 | 354 | 0.1% |
passengers
Real number (ℝ)
ZEROS 
| Distinct | 3883 |
|---|---|
| Distinct (%) | 1.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 299.47679 |
| Minimum | 0 |
|---|---|
| Maximum | 8301 |
| Zeros | 7439 |
| Zeros (%) | 3.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.9 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 21 |
| median | 113 |
| Q3 | 339 |
| 95-th percentile | 1293 |
| Maximum | 8301 |
| Range | 8301 |
| Interquartile range (IQR) | 318 |
Descriptive statistics
| Standard deviation | 511.38949 |
|---|---|
| Coefficient of variation (CV) | 1.7076097 |
| Kurtosis | 22.475975 |
| Mean | 299.47679 |
| Median Absolute Deviation (MAD) | 105 |
| Skewness | 3.8306029 |
| Sum | 73657815 |
| Variance | 261519.21 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 7605 | 3.1% |
| 0 | 7439 | 3.0% |
| 2 | 5434 | 2.2% |
| 3 | 4297 | 1.7% |
| 4 | 3752 | 1.5% |
| 5 | 3285 | 1.3% |
| 6 | 2776 | 1.1% |
| 7 | 2616 | 1.1% |
| 8 | 2551 | 1.0% |
| 9 | 2301 | 0.9% |
| Other values (3873) | 203899 |
| Value | Count | Frequency (%) |
| 0 | 7439 | |
| 1 | 7605 | |
| 2 | 5434 | |
| 3 | 4297 | |
| 4 | 3752 | |
| 5 | 3285 | |
| 6 | 2776 | 1.1% |
| 7 | 2616 | 1.1% |
| 8 | 2551 | 1.0% |
| 9 | 2301 | 0.9% |
| Value | Count | Frequency (%) |
| 8301 | 1 | |
| 8103 | 1 | |
| 8023 | 1 | |
| 7857 | 1 | |
| 7718 | 1 | |
| 7661 | 1 | |
| 7555 | 1 | |
| 7553 | 1 | |
| 7469 | 1 | |
| 7390 | 1 |
fare
Real number (ℝ)
HIGH CORRELATION 
| Distinct | 36323 |
|---|---|
| Distinct (%) | 14.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 218.97959 |
| Minimum | 50 |
|---|---|
| Maximum | 3377 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.9 MiB |
Quantile statistics
| Minimum | 50 |
|---|---|
| 5-th percentile | 107.27 |
| Q1 | 164.62 |
| median | 209.32 |
| Q3 | 262.89 |
| 95-th percentile | 354.2 |
| Maximum | 3377 |
| Range | 3327 |
| Interquartile range (IQR) | 98.27 |
Descriptive statistics
| Standard deviation | 82.372486 |
|---|---|
| Coefficient of variation (CV) | 0.37616513 |
| Kurtosis | 32.454488 |
| Mean | 218.97959 |
| Median Absolute Deviation (MAD) | 48.59 |
| Skewness | 2.3810786 |
| Sum | 53859124 |
| Variance | 6785.2264 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 209 | 33 | < 0.1% |
| 182.48 | 31 | < 0.1% |
| 201.14 | 30 | < 0.1% |
| 175 | 29 | < 0.1% |
| 178 | 28 | < 0.1% |
| 192 | 28 | < 0.1% |
| 209.33 | 27 | < 0.1% |
| 170 | 27 | < 0.1% |
| 231 | 27 | < 0.1% |
| 191.95 | 27 | < 0.1% |
| Other values (36313) | 245668 |
| Value | Count | Frequency (%) |
| 50 | 1 | |
| 50.4 | 1 | |
| 50.41 | 1 | |
| 50.5 | 1 | |
| 50.72 | 1 | |
| 50.8 | 2 | |
| 50.96 | 1 | |
| 50.98 | 2 | |
| 50.99 | 1 | |
| 51 | 2 |
| Value | Count | Frequency (%) |
| 3377 | 1 | |
| 2716 | 1 | |
| 2628.9 | 1 | |
| 2104.9 | 1 | |
| 2074 | 1 | |
| 2034.35 | 1 | |
| 1991 | 1 | |
| 1950 | 1 | |
| 1871 | 1 | |
| 1841.7 | 1 |
large_ms
Real number (ℝ)
| Distinct | 7367 |
|---|---|
| Distinct (%) | 3.0% |
| Missing | 1540 |
| Missing (%) | 0.6% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.66525163 |
| Minimum | 0.0038 |
|---|---|
| Maximum | 1 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.9 MiB |
Quantile statistics
| Minimum | 0.0038 |
|---|---|
| 5-th percentile | 0.31 |
| Q1 | 0.48 |
| median | 0.6524 |
| Q3 | 0.8719 |
| 95-th percentile | 1 |
| Maximum | 1 |
| Range | 0.9962 |
| Interquartile range (IQR) | 0.3919 |
Descriptive statistics
| Standard deviation | 0.22463466 |
|---|---|
| Coefficient of variation (CV) | 0.3376687 |
| Kurtosis | -1.164393 |
| Mean | 0.66525163 |
| Median Absolute Deviation (MAD) | 0.1924 |
| Skewness | -0.038183459 |
| Sum | 162597.48 |
| Variance | 0.050460729 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 13803 | 5.6% |
| 0.99 | 4691 | 1.9% |
| 0.5 | 4484 | 1.8% |
| 0.98 | 3715 | 1.5% |
| 0.97 | 3359 | 1.4% |
| 0.96 | 3031 | 1.2% |
| 0.66 | 2871 | 1.2% |
| 0.6 | 2780 | 1.1% |
| 0.52 | 2738 | 1.1% |
| 0.51 | 2725 | 1.1% |
| Other values (7357) | 200218 |
| Value | Count | Frequency (%) |
| 0.0038 | 1 | < 0.1% |
| 0.0052 | 1 | < 0.1% |
| 0.0074 | 2 | < 0.1% |
| 0.0077 | 1 | < 0.1% |
| 0.008 | 1 | < 0.1% |
| 0.0081 | 1 | < 0.1% |
| 0.01 | 15 | |
| 0.02 | 10 | |
| 0.03 | 8 | |
| 0.04 | 4 | < 0.1% |
| Value | Count | Frequency (%) |
| 1 | 13803 | |
| 0.9999 | 2 | < 0.1% |
| 0.9998 | 24 | < 0.1% |
| 0.9997 | 33 | < 0.1% |
| 0.9996 | 37 | < 0.1% |
| 0.9995 | 35 | < 0.1% |
| 0.9994 | 38 | < 0.1% |
| 0.9993 | 31 | < 0.1% |
| 0.9992 | 45 | < 0.1% |
| 0.9991 | 31 | < 0.1% |
fare_lg
Real number (ℝ)
HIGH CORRELATION 
| Distinct | 37508 |
|---|---|
| Distinct (%) | 15.3% |
| Missing | 1540 |
| Missing (%) | 0.6% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 218.71096 |
| Minimum | 50 |
|---|---|
| Maximum | 2725.6 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.9 MiB |
Quantile statistics
| Minimum | 50 |
|---|---|
| 5-th percentile | 102.917 |
| Q1 | 161.5 |
| median | 208.03 |
| Q3 | 263.64 |
| 95-th percentile | 364.793 |
| Maximum | 2725.6 |
| Range | 2675.6 |
| Interquartile range (IQR) | 102.14 |
Descriptive statistics
| Standard deviation | 84.674363 |
|---|---|
| Coefficient of variation (CV) | 0.38715189 |
| Kurtosis | 14.627054 |
| Mean | 218.71096 |
| Median Absolute Deviation (MAD) | 50.43 |
| Skewness | 1.7053114 |
| Sum | 53456240 |
| Variance | 7169.7477 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 180 | 29 | < 0.1% |
| 205.46 | 29 | < 0.1% |
| 171 | 29 | < 0.1% |
| 183 | 28 | < 0.1% |
| 164.47 | 27 | < 0.1% |
| 208.98 | 27 | < 0.1% |
| 228 | 27 | < 0.1% |
| 147 | 26 | < 0.1% |
| 167 | 26 | < 0.1% |
| 171.55 | 26 | < 0.1% |
| Other values (37498) | 244141 | |
| (Missing) | 1540 | 0.6% |
| Value | Count | Frequency (%) |
| 50 | 1 | |
| 50.4 | 1 | |
| 50.41 | 1 | |
| 50.5 | 1 | |
| 50.65 | 1 | |
| 50.72 | 1 | |
| 50.8 | 2 | |
| 50.96 | 1 | |
| 50.98 | 2 | |
| 50.99 | 1 |
| Value | Count | Frequency (%) |
| 2725.6 | 1 | |
| 2710.9 | 1 | |
| 1897.7 | 1 | |
| 1664 | 1 | |
| 1661 | 1 | |
| 1582.6 | 1 | |
| 1560.8 | 1 | |
| 1501.42 | 1 | |
| 1420.6 | 1 | |
| 1383.4 | 1 |
lf_ms
Real number (ℝ)
| Distinct | 9687 |
|---|---|
| Distinct (%) | 4.0% |
| Missing | 1612 |
| Missing (%) | 0.7% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.45043751 |
| Minimum | 0.01 |
|---|---|
| Maximum | 1 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.9 MiB |
Quantile statistics
| Minimum | 0.01 |
|---|---|
| 5-th percentile | 0.0253 |
| Q1 | 0.158 |
| median | 0.36 |
| Q3 | 0.75 |
| 95-th percentile | 1 |
| Maximum | 1 |
| Range | 0.99 |
| Interquartile range (IQR) | 0.592 |
Descriptive statistics
| Standard deviation | 0.33266903 |
|---|---|
| Coefficient of variation (CV) | 0.73854646 |
| Kurtosis | -1.2506742 |
| Mean | 0.45043751 |
| Median Absolute Deviation (MAD) | 0.24 |
| Skewness | 0.43051319 |
| Sum | 110061.25 |
| Variance | 0.11066868 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 13803 | 5.6% |
| 0.01 | 6212 | 2.5% |
| 0.1 | 5593 | 2.3% |
| 0.11 | 5124 | 2.1% |
| 0.12 | 4703 | 1.9% |
| 0.99 | 4690 | 1.9% |
| 0.13 | 4348 | 1.8% |
| 0.02 | 3858 | 1.6% |
| 0.14 | 3846 | 1.6% |
| 0.15 | 3715 | 1.5% |
| Other values (9677) | 188451 |
| Value | Count | Frequency (%) |
| 0.01 | 6212 | |
| 0.0101 | 18 | < 0.1% |
| 0.0102 | 21 | < 0.1% |
| 0.0103 | 19 | < 0.1% |
| 0.0104 | 15 | < 0.1% |
| 0.0105 | 22 | < 0.1% |
| 0.0106 | 22 | < 0.1% |
| 0.0107 | 24 | < 0.1% |
| 0.0108 | 25 | < 0.1% |
| 0.0109 | 20 | < 0.1% |
| Value | Count | Frequency (%) |
| 1 | 13803 | |
| 0.9999 | 2 | < 0.1% |
| 0.9998 | 24 | < 0.1% |
| 0.9997 | 33 | < 0.1% |
| 0.9996 | 37 | < 0.1% |
| 0.9995 | 35 | < 0.1% |
| 0.9994 | 38 | < 0.1% |
| 0.9993 | 31 | < 0.1% |
| 0.9992 | 45 | < 0.1% |
| 0.9991 | 31 | < 0.1% |
fare_low
Real number (ℝ)
HIGH CORRELATION 
| Distinct | 32283 |
|---|---|
| Distinct (%) | 13.2% |
| Missing | 1612 |
| Missing (%) | 0.7% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 190.67594 |
| Minimum | 50 |
|---|---|
| Maximum | 2725.6 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.9 MiB |
Quantile statistics
| Minimum | 50 |
|---|---|
| 5-th percentile | 93.19 |
| Q1 | 140.06 |
| median | 181.63 |
| Q3 | 230.04 |
| 95-th percentile | 312.57 |
| Maximum | 2725.6 |
| Range | 2675.6 |
| Interquartile range (IQR) | 89.98 |
Descriptive statistics
| Standard deviation | 73.577694 |
|---|---|
| Coefficient of variation (CV) | 0.38587823 |
| Kurtosis | 18.130088 |
| Mean | 190.67594 |
| Median Absolute Deviation (MAD) | 44.49 |
| Skewness | 1.9783875 |
| Sum | 46590331 |
| Variance | 5413.677 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 178 | 45 | < 0.1% |
| 171 | 43 | < 0.1% |
| 149 | 43 | < 0.1% |
| 180 | 43 | < 0.1% |
| 147 | 42 | < 0.1% |
| 175 | 41 | < 0.1% |
| 158 | 39 | < 0.1% |
| 202 | 39 | < 0.1% |
| 153 | 37 | < 0.1% |
| 154 | 37 | < 0.1% |
| Other values (32273) | 243934 | |
| (Missing) | 1612 | 0.7% |
| Value | Count | Frequency (%) |
| 50 | 1 | < 0.1% |
| 50.1 | 1 | < 0.1% |
| 50.4 | 2 | |
| 50.41 | 1 | < 0.1% |
| 50.5 | 2 | |
| 50.6 | 2 | |
| 50.65 | 1 | < 0.1% |
| 50.72 | 1 | < 0.1% |
| 50.8 | 3 | |
| 50.9 | 2 |
| Value | Count | Frequency (%) |
| 2725.6 | 1 | |
| 1897.7 | 1 | |
| 1664 | 1 | |
| 1420.6 | 1 | |
| 1383.4 | 1 | |
| 1336.5 | 1 | |
| 1312 | 1 | |
| 1269.78 | 1 | |
| 1268.05 | 1 | |
| 1261.5 | 1 |
| Year | fare | fare_lg | fare_low | large_ms | lf_ms | nsmiles | passengers | quarter | |
|---|---|---|---|---|---|---|---|---|---|
| Year | 1.000 | 0.194 | 0.193 | 0.216 | 0.103 | 0.107 | 0.023 | 0.107 | 0.037 |
| fare | 0.194 | 1.000 | 0.965 | 0.863 | -0.215 | -0.215 | 0.521 | -0.267 | 0.014 |
| fare_lg | 0.193 | 0.965 | 1.000 | 0.821 | -0.204 | -0.266 | 0.492 | -0.216 | 0.016 |
| fare_low | 0.216 | 0.863 | 0.821 | 1.000 | -0.116 | 0.052 | 0.428 | -0.300 | 0.011 |
| large_ms | 0.103 | -0.215 | -0.204 | -0.116 | 1.000 | 0.410 | -0.408 | -0.083 | 0.007 |
| lf_ms | 0.107 | -0.215 | -0.266 | 0.052 | 0.410 | 1.000 | -0.237 | -0.207 | 0.006 |
| nsmiles | 0.023 | 0.521 | 0.492 | 0.428 | -0.408 | -0.237 | 1.000 | -0.103 | 0.009 |
| passengers | 0.107 | -0.267 | -0.216 | -0.300 | -0.083 | -0.207 | -0.103 | 1.000 | 0.011 |
| quarter | 0.037 | 0.014 | 0.016 | 0.011 | 0.007 | 0.006 | 0.009 | 0.011 | 1.000 |
A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
| Year | quarter | nsmiles | passengers | fare | large_ms | fare_lg | lf_ms | fare_low | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 2021 | 3 | 970 | 180 | 81.43 | 1.0000 | 81.43 | 1.0000 | 81.43 |
| 1 | 2021 | 3 | 970 | 19 | 208.93 | 0.4659 | 219.98 | 0.1193 | 154.11 |
| 2 | 2021 | 3 | 580 | 204 | 184.56 | 0.9968 | 184.44 | 0.9968 | 184.44 |
| 3 | 2021 | 3 | 580 | 264 | 182.64 | 0.9774 | 183.09 | 0.9774 | 183.09 |
| 4 | 2021 | 3 | 328 | 398 | 177.11 | 0.6061 | 184.49 | 0.3939 | 165.77 |
| 5 | 2021 | 3 | 1974 | 153 | 324.97 | 0.4263 | 323.73 | 0.1609 | 298.20 |
| 6 | 2021 | 3 | 1974 | 16 | 315.90 | 0.7285 | 270.42 | 0.7285 | 270.42 |
| 7 | 2021 | 3 | 1974 | 22 | 329.22 | 0.5415 | 271.60 | 0.5415 | 271.60 |
| 8 | 2021 | 3 | 1670 | 159 | 255.89 | 0.7212 | 244.89 | 0.7212 | 244.89 |
| 9 | 2021 | 3 | 1670 | 151 | 291.16 | 0.4404 | 296.88 | 0.3197 | 247.20 |
| Year | quarter | nsmiles | passengers | fare | large_ms | fare_lg | lf_ms | fare_low | |
|---|---|---|---|---|---|---|---|---|---|
| 245945 | 2024 | 1 | 464 | 184 | 235.68 | 0.9463 | 229.01 | 0.9463 | 229.01 |
| 245946 | 2024 | 1 | 464 | 62 | 231.34 | 0.9482 | 224.58 | 0.9482 | 224.58 |
| 245947 | 2024 | 1 | 665 | 99 | 183.51 | 0.5657 | 97.38 | 0.5657 | 97.38 |
| 245948 | 2024 | 1 | 665 | 5 | 332.42 | 0.7442 | 310.57 | 0.7442 | 310.57 |
| 245949 | 2024 | 1 | 665 | 8 | 280.76 | 0.5658 | 254.62 | 0.5658 | 254.62 |
| 245950 | 2024 | 1 | 665 | 207 | 278.70 | 0.7503 | 287.44 | 0.2359 | 248.46 |
| 245951 | 2024 | 1 | 724 | 277 | 148.69 | 0.8255 | 114.45 | 0.8255 | 114.45 |
| 245952 | 2024 | 1 | 724 | 70 | 330.19 | 0.8057 | 321.92 | 0.8057 | 321.92 |
| 245953 | 2024 | 1 | 550 | 178 | 95.65 | 1.0000 | 95.65 | 1.0000 | 95.65 |
| 245954 | 2024 | 1 | 550 | 57 | 330.15 | 0.5212 | 288.38 | 0.5212 | 288.38 |
Most frequently occurring
| Year | quarter | nsmiles | passengers | fare | large_ms | fare_lg | lf_ms | fare_low | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1999 | 2 | 248 | 0 | 92.0 | 1.0 | 92.0 | 1.0 | 92.0 | 2 |